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We describe new approaches for distances between pairs of 2- 
dimensional surfaces (embedded in 3 dimensional space) that use 
local structures and global information contained in inter-structure 
geometric relationships. We present algorithms to automatically de- 
termine these distances as well as geometric correspondences. 
This is motivated by the aspiration of students of natural science 
to understand the continuity of form that unites the diversity of 
life. At present, scientists using physical traits to study evolu- 
tionary relationships among living and extinct animals analyze data 
extracted from carefully defined anatomical correspondence points 
(landmarks). Identifying and recording these landmarks is time con- 
suming and can be done accurately only by trained morphologists. 
This renders these studies inaccessible to non-morphologists, and 
causes phenomics to lag behind genomics in elucidating evolution- 
ary patterns. 

Unlike other algorithms presented for morphological correspondences 
our approach does not require any preliminary marking of special fea- 
tures or landmarks by the user. It also differs from other seminal 
work in computational geometry in that our algorithms are polyno- 
mial in nature and thus faster, making pairwise comparisons feasible 
for significantly larger numbers of digitized surfaces. We illustrate 
our approach using three datasets representing teeth and different 
bones of primates and humans, and show that it leads to highly 
accurate results. 

homology | phenomics | morphometries | Procrustes | Mobius transforma- 
tions | automatic species recognition 

To document and understand physical and biological phe- 
nomena (e.g., geological sedimentation, chemical re- 
actions, ontogenetic development, speciation, evolutionary 
adaptation, etc.), it is important to quantify the similarity or 
dissimilarity of objects affected or produced by the phenom- 
ena under study. The grain size or elasticity of rocks, geo- 
graphic distances between populations, or hormone levels and 
body masses of individuals - these can be readily measured, 
and the resulting numerical values can be used to compute 
similarities/distances that help build understanding. Other 
properties like genetic makeup or gross anatomical structure 
can not be quantified by a single number; determining how to 
measure and compare these is more involved [341 1361 1501 153] . 
Representing the structure of a gene (through sequencing) or 
quantification of an anatomical structure (through the digi- 
tization of its surface geometry) leads to more complex nu- 
merical representations; even though these are not measure- 
ments allowing direct comparison with their counterparts for 
other genes or anatomical structures, they represent an es- 
sential initial step for such quantitative comparisons. The 1- 
dimensional, sequential arrangement of genomes and the dis- 
crete variation (four nucleotide base types) for each of thou- 
sands of available correspondence points help reduce the com- 
putational complexity of determining the most likely align- 
ment between genomes; alignment procedures are now in- 
creasingly automated 18 . The resulting, rapidly generated 
and massive data sets, analyzed with increasing sophistication 
and flexible in-depth exploration due to advances in comput- 



ing technology, have lead to spectacular progress. For instance 
phylogenetics has begun to unravel mysteries of large scale 
evolutionary relationships experienced as extraordinarily dif- 
ficult by morphologists [f9| . 

Analyses of massive developmental and genetic data sets 
outpace those on morphological data. The comparative study 
of gross anatomical structures has lagged behind mainly be- 
cause it is harder to determine corresponding parts on dif- 
ferent samples, a prerequisite for measurement. The diffi- 
culty stems from the higher dimension (2 for surfaces vs. 1 
for genomes), the continuous rather than discrete nature of 
anatomical objects|J] and from large shape variations. 

In standard morphologists' practice, correspondences are 
first assessed visually; then, some (10 to 100, at most) feature 
points can usually be defined as equivalent and/or identified 
as landmarks. Just as comparisons of tens of thousands of nu- 
cleotide base positions are used to determine similarity among 
genomes, the coordinates of these dozens of feature points (or 
measurements they define) are used to evaluate patterns of 
shape variation and similarity /difference [42]. However, as 
stated in 1936 by G. G. Simpson, the paleontologist chaperon 
of the Modern Synthesis in the study of evolution, the "diffi- 
culty in acquiring personal knowledge" f |21j p. 3) of morpho- 
logical evidence limits our understanding of the evolutionary 
significance of morphological diversity; this remains true to- 
day. New techniques for generating and analyzing digital rep- 
resentations have led to major advances (see, e.g. [2511221 [23] ). 
but they typically still require determinations of anatomical 
landmarks by observers whose skill of identifying anatomical 
correspondences takes many years of training. 

Several groups have sought to determine automatic cor- 
respondence among morphological structures. Existing suc- 
cessful methods typically introduce an effective dimensional 
reduction, using, e.g., 2-D outlines and/or images [54,, or, in 
one of the few studies attempting automatic biological cor- 
respondences in 3-D as a method for evolutionary morpholo- 
gists, "automatically-detected crest lines" [26] on surfaces ob- 
tained by CT-scans to register modern human skulls to each 
other [27] or to pre-Neanderthal, Homo heidelbergensis skulls 
[28] ; another example is Wiley et al [25]. Studies using 2- 
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1 These differences may seem innocuous but they lead to an exponential increase in the size of the 
"search spaces" to be explored by comparison algorithms. 
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D outlines or images sacrifice a lot of the original geometric 
information available in the 3-D objects on which they are 
based; such specifically limited representations cannot easily 
be incorporated into other studies. More generally and most 
importantly, none of these methods are independent of user 
input. When outlines or standard 2-D views are used, precise 
observations of the 3-D anatomical structures are required by 
the trained technician who creates the outlines or 2-D views 
[481147] . Several methods for 3-D alignment use Iterative Clos- 
est Point (ICP) algorithms [21] that require observer input 
to fix an initial guess (then further improved via local opti- 
mization); ICP-generated correspondences can also have large 
distortions and discontinuities of shape. In [3 [9] surfaces are 
matched by using the Gromov-Hausdorff distances between 
them, and applications to several shape analysis problems are 
given. However, Gromov-Hausdorff distances are hard to com- 
pute, and have to be approximated; the gradient descent op- 
timization used in practice does not guarantee convergence to 
a global (rather than local) minimum. 

Determination of correspondences or similarities among 
3-D digitizations of general anatomic surfaces that is both 1) 
fully automated and 2) computationally fast (to handle the 
large data sets that are becoming increasingly available as 
imaging technologies become more widespread and efficient 
[40]) is still elusive. Our aim here is to remedy this by fully au- 
tomating the determination of correspondences among gross 
anatomical structures. Success in this pursuit will help bring 
to phenomic studies the rate, objectivity and exhaustiveness 
of genomic studies. Large scale initiatives to phenotype model 
species after systematically knocking out each gene |38j . as 
well as analysis of computational simulations of organogenesis 
[39| stand to greatly benefit from automating the determina- 
tion of correspondence among, and measurement of, morpho- 
logical structures. 

In this paper, we describe several new distances between 
surfaces that can be used for such fully automated anatomic 
correspondences, and we test their relevance for biologically 
meaningful tasks on several anatomical dataset examples 
(high resolution digitizations of bones and teeth). 

The paper is organized as follows. Section 1 gives the 
mathematical background for our algorithms: conformal ge- 
ometry and optimal mass transportation (also known as Earth 
Mover's distance). In section 2, we use these ingredients to 
define new distances or measures of dissimilarity, including a 
generalization to surfaces of the Procrustes distance. Section 
3 presents the results obtained by our algorithms for three 
different morphological data sets, and an application. 

No technical advance stands on its own; this paper is no 
exception. Conformal geometry is a powerful mathematical 
tool (permitting the reduction of the study of surfaces em- 
bedded in 3-D space to 2-D problems) that has been useful in 
many computational problems; [4] provides an introduction 
to both theory and algorithms, with many applications, in- 
cluding the use of conformal images of anatomical structures, 
combined with user prescribed landmarks and/or special fea- 
tures, for registration purposes, seeking "optimal" correspon- 
dence between pairs of surfaces [BJ. Earth Mover's distances 
[5] and continuous optimal mass transportation [I] have been 
used in image registration and for more general image analysis 
and parameterization [2]; in [8] (quadratic) mass transporta- 
tion is used to relax the notion of Gromov-Hausdorff distance. 
Procrustes distances for discrete point sets are familiar to mor- 
phologists and other researchers working on shape analysis 
[231 [42] . The mathematical and algorithmic contribution of 
our work is the combination in which we use and generalize 
these ingredients to construct novel distance metrics, paired 



with efficient, fully automatic algorithms not requiring user 
guidance. They open the door to new applications requiring 
a large number of distance computations. 

1. The mathematical components 

Conformal geometry. A mapping ip from one 2-dimensional 
(smooth) surface 5 to another, 5', defines for every point 
p £ S a corresponding point p(p) £ S' . If the mapping is 
smooth itself, it maps a smooth curve F on 5 to a corre- 
sponding smooth curve T' on 5' that is called the image of 
r. Two curves Pi and P2 on 5 that intersect in a point s are 
mapped to curves F[, T' 2 that intersect as well, in s' = p(s). 
Consider the two (straight) lines i\ and I2 that are tangent 
to the curves Ti and T2 at their intersection point s; the an- 
gle between Ti and Y2 at s is then taken to mean the angle 
between the two lines t\ and £2; similarly, the angle between 
the curves Ti and F 2 (at s' = <p(s)) is the angle between their 
tangent lines at s'. The mapping p is called conformal if for 
any two smooth curves Ti and T2 on S, the angle between 
their images V'i and T'2 is the same as that between Ti and T2 
at the corresponding intersection point. 

Riemann's uniformization theorem f4] guarantees that ev- 
ery (reasonable) 2-d surface S in our standard 3-d space 
that is a disk-type surface (i.e. that has a boundary but 
no holes) can be mapped conformally to the 2-d unit disk 
D = {z I 2 = x + iy, \z\ < I}, with the boundary of the disk 
corresponding to the boundary of <s[^] This mapping is called 
"conformally flattening" [^] This flattening process is accom- 
panied by area distortion; the conformal factor f(x, y) on the 
disk, varying from point to point, indicates the area distortion 
factor produced by the operation. 

One important practical implication of this theorem is 
that the family of conformal maps between two surfaces can be 
characterized naturally via the flattened representations of the 
surfaces: if 7 is a conformal mapping from S to 5', and p (ip') 
is a flattening (i.e. a conformal map to the disk D) of S (5'), 
then the family of all possible conformal mappings from S to 
S' is given by 7 = ip' -1 omoip, where m ranges over all the con- 
formal bijective self-mappings of the unit disk D. We shall call 
such m disk-preserving Mobius transformations; they consti- 
tute a group, the disk-preserving Mobius transformation group 
M. Each m in M is characterized by 3 parameters and given 
by the closed-form formula m{z) = e l9 (z — a)(l — za) _1 , where 
6 6 [0,27r), |a| < 1. For our applications, it is important 
that the flattening process (starting from a triangulated digi- 
tized version of 5) and more importantly the disk-preserving 
Mobius transformations can be computed fast and with high 
accuracy; for more details, see [161 115]|?] Note that the flat- 
tening map of a surface S is not unique; one can choose any 
arbitrary point of S to be mapped to the origin of the disk 
D, and any direction through this point to become the "x- 
axis". The transition from choosing one (center, direction) 
pair to another is simply a disk-preserving Mobius transfor- 
mation. It is convenient to equip the disk D with its hyper- 
bolic measure dn(x,y) = [1 — (x 2 + y 2 )]~ 2 dx dy, invariant 
with respect to Mobius transformations; correspondingly, we 
set f (x, y) = [1 — (x 2 + y 2 )] 2 f(x, y), so that f dr\ = f dxdy . 

Optimal mass transportation. An integrable function fj, is a 
(normalized) mass distribution on a domain D if fJ.(u) > 



2 We shall restrict ourselves to this case here, although our approach is more general; see 15 ). 
3 The uniformization theorem holds for more general surfaces as well. For instance, surfaces without 
holes, handles or boundaries can be mapped conformally to a sphere; if one point is removed from 
such a surface, it can be mapped conformally to the full plane. Surfaces with holes or handles can 
still be conformally flattened to a piece of the plane. 

4 If the digitization of the surface is given as a point cloud, standard fast algorithms can be used to 
determine an appropriate (e.g. Delauney) triangulation. 



is well defined for each u G D, and J D fi{u) du — 1. If r 
is a differentiable bijection from D to itself, the mass dis- 
tribution fj,' — T,fi on D defined by fi(u) = /j,'(t(u)) J t (u) 
(where J T is the Jacobian of the map r), is the transportation 
(or push- forward) of fi by r in the sense that, for any arbi- 
trary (non pathological) function F on D, J F(u) //(w) du = 
J D F(t(u)) du. The total transportation effort is given 
by S T = f D d(u, t(u)) fj.(u) du, where d(u, v) denotes the dis- 
tance between two points u and v in D. 

If two mass distributions /i and v on D are given, then the 
optimal mass transportation distance between fi and v (in 
the sense of Monge, see [10], p. 4) is the infimum of the 
transportation effort S T , taken over all the measurable bijec- 
tions t from D to D for which v equals the transportation 
of /i by t. This set of bijections is hard to search; the de- 
termination of an optimal mass transportation scheme be- 
comes more tractable if the mass "at u" need not all end 
up at the same end point. One then considers measures 
7r on D x D with marginals \x and v (this means that for 
all continuous functions F, G on D, J DxD F(u) dn(u,v) = 
J D F(u)^j,(u)du and J DxD G(v) dir(u, v) = J D G(v) u(v) dv); 
the optimal mass transportation in this more general Kan- 
torovitch formulation is the infimum over all such measures ir 
of E w — J DxD d(u, v) dn(u, v). A comprehensive treatment of 
optimal mass transport is in [10] . 

2. New distances between 2-dimensional surfaces 
Conformal Wasserstein distance (cW). One can use opti- 
mal mass transport to compare conformal factors f and f 
obtained by conformally flattening two surfaces, S and S' . 
If m is a disk-preserving Mobius transformation, then f and 
m*f = f om" 1 are both equally valid conformal factors for S. 
A standard approach to take this into account is to "quotient" 
over M, which leads to the conformal Wasserstein distance: 



T) cW (S,S')= inf 



inf / d(z, z ,s )div(2 

i6n(m,f,f'}JDxD 
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where d(-, •) is the (conformally invariant) hyperbolic dis- 
tance^] in D; T) c ^y satisfies then all the properties of a met- 
ric [12]. In particular, T) c ^y(5,5') = iff 5 and 5' are 
isometric. However, computing this metric requires solving 
a Kantorovitch mass-transportation problem for every can- 
didate m; even though the whole procedure has polynomial 
runtime complexity, it is too heavy to be used in practice for 
large datasets. 

Conformal Wasserstein neighborhood dissimilarity distance 
(cWn). We propose another natural way to use Kantorovich's 
optimal mass transport to compare surfaces 5 and 5'. Instead 
of determining the most efficient way to transport "mass" 
f from z to z' , we can quantify how dissimilar the "land- 
scapes" are, defined by f and f' near z, resp. z' , and re- 
place the distance d(-,-) by a measure of neighborhood dis- 
similarity. The neighborhood N(0, R) around is given by 
N(0,R) = { z ; \z\ < R }; neighborhoods around other points 
are obtained by letting the disk-preserving Mobius transfor- 
mations act on N(0, R): for any m in M such that z = m(0), 
N(z, R) is the image of N(0, R) under the mapping m. Next 
we define the dissimilarity between f at z and f' at z : 
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\f(w) — f'(m(w)) | dr/(w) 



It is straightforward to check that for all m,m' in A4, 
d.m ; f , {m{z),m'{z')) — d^ { , (z, z'). We now use optimal 



transport, and define the conformal Wasserstein neighborhood 
dissimilarity distance between f and f': 

f c Wn( 5 . S ') = irlf , / d tt ,(z,z')dTr(z,z'), [2] 
•n-en(f,f ) Jdxd 

where the superscript recalls that this definition depends on 
the choice of the parameter R. For a proof that this defines a 
true distance between (generic) surfaces S and S' , and further 
mathematical properties, see |12II13] . One practical difference 
with T) c -\y is that |2| requires solving only one Kantorovitch 
mass-transportation problem once the special dissimilarity 
cost is computed, resulting in a simpler optimization prob- 
lem. To implement the computation of these distances, we 
discretize the integrals and the optimization searches, picking 
collections of discrete points on the surfaces; the minimizing 
measure 7r in the definition of I>^y n (iS, 5') can then be used 
to define a correspondence between points of S and 5'. 

Continuous Procrustes distance between surfaces (cP). 

Both cW and cWn are intrinsic: they use only information 
"visible" from within each surface, such as geodesic distances 
between pairs of points; consequently they do not distinguish a 
surface from any of its isometric embeddings in 3-D. The con- 
tinuous Procrustes (cP) distance [H] described in this section 
uses some extrinsic information as well; it fails to distinguish 
two surfaces only if one is obtained by applying to the other 
a rigid motion (which is a very special isometry). 

The (standard) Procrustes distance is defined between 
discrete sets of points X = (X n ) n= i i ... i jv C 5 and Y = 
(Y n ) n =i ,....n C S' by minimizing over all rigid motions: 



dp(X,Y) 



L R rig. mot. 
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where | • | denotes the standard Euclidean norm, rj Often X, Y 
are sets of landmarks on two surfaces, and dp(j£,Y) is inter- 
preted as a distance between these surfaces. This practice has 
several drawbacks: 1) dp(X, Y) depends on the (subjective) 
choice of X, Y, which makes it a not necessarily "well-defined" 
or easily reproducible proxy for a surface distance; 2) the (rel- 
atively) small number of N landmarks on each surface disre- 
gards a wealth of geometric data; 3) identifying and recording 
the x n , Vn is time consuming and requires expertise. 

We eliminate all these drawbacks by a landmark-free 
approach, introducing the continuous Procrustes distance. In- 
stead of relying on experts to identify "corresponding" dis- 
crete subsets of S and S' , we consider a family of continuous 
maps a : S — > 5' between the surfaces, and rely on optimiza- 
tion to identify the "best" a. The earlier exact correspondence 
of one point Y n to one point X n , and the (tacit) assumption 
that X (Y) collectively represent all the noteworthy aspects 
of S (S') in a balanced way, are recast as requiring that the 
"correspondence map" be area-preserving |14| . that is, for ev- 
ery (measurable) subset Q of S, J n dAs ~ J a ^ n < ) dA$i , where 
dAs and dA$i are the area elements on the surfaces induced 
by their embeddings in R 3 . We denote A(S,S') the set of all 
these area-preserving difieomorphisms. For each a in ^4(5, 5') , 
we set d(S,S',a) 2 = mm R r ig. mo t. is !^( :E ) — a(x)\ 2 dAs; the 
continuous Procrustes distance between S and S' is then 



T> P (S,S') 



inf d(S,S', 



This is the geodesic distance on D induced by the hyperbolic Riemann metric tensor dr] on 
D. The geodesic from the origin to any point z in D is the straight line connecting them, and 
d(0, Z )=ln[(l + | Z |)/(l-| Z |)]. 

^It is interesting to note that in 1111 , a Kantorovich version of dp was introduced, and its equiva- 
lence to the Gromov-Wasserstein distance (when the shapes are endowed with Euclidean distances) 
was proved. 



This defines a metric distance on the space of surfaces (up to 
rigid motions: for congruent surfaces the distance is 0) |14j . 
Minimizing over rigid motions is easy; there exist closed form 
formulas, as in the discrete case. But the second set over 
which to minimize, A(S, S'), is an unwieldy, formally infinite- 
dimensional manifold, hard to explor«Q This is an optimal 
transport problem again, now in the much harder Monge for- 
mulation. For "reasonable" surfaces (e.g., surfaces with uni- 
formly bounded curvature), transformations a close to optimal 
are close to conformal [14) . This crucial insight allows limit- 
ing the search to the much smaller space of maps obtained by 
small deformations of conformal maps. Concretely, we com- 
pose a conformal map (represented as a Mobius transforma- 
tion) m £ M with maps \ an< A Q> where g is a smooth map 
that roughly aligns high density peaks, and x is a special de- 
formation (following [?I]) using local diffusion to make x°Q° m 
area preserving (up to approximation error). For each choice 
of peaks p, p' in the conformal factors of 5, 5', the algorithm 

1) runs through the 1-parameter family of m that map p to p'; 

2) constructs a map g that aligns the other peaks, as best pos- 
sible; and 3) computes d(5, S' , g o m). Repeat for all choices 
of p, p'; the gom that minimizes d is then deformed to be area 
preserving, producing the map a = \ o go m; d(S, S' ,a) and 
a are our approximate T)p(S,S') and correspondence map, 
respectively. (More in Supplementary Materials.) 

3. Application to anatomical datasets 

To test our approach, we used three independent datasets, 
representing three different regions of the skeletal anatomy, 
of humans, other primates, and their close relatives. Digi- 
tized surfaces were obtained from High Resolution X-ray Com- 
puted Tomography (piCT) scans (see Supplementary Materi- 
als) of (A) 116 second mandibular molars of prosimian pri- 
mates and non-primate close relatives, and (B) 57 proximal 
first metatarsals of prosimian primates, New and Old World 
monkeys, and (C) 45 distal radii of apes and humans. For 
every pair of surfaces, the output of our algorithms consists 
of 1) a correspondence map for the whole surface (i.e. not 
just a few points), and 2) a non-negative number giving their 
dissimilarity (where zero means they are isometric or congru- 
ent). Typical running times for a pair of surfaces were ~ 20 
sec. for cP, ~ 5 min. for cWn. To evaluate the performance 
of the algorithms, we compared the outcomes to those deter- 
mined independently by morphologists. Using the same set of 
digitized surfaces, geometric morphometricians collected land- 
marks on each, in the conventional fashion [42], choosing them 
to reflect correspondences considered biologically and evolu- 
tionarily meaningful (see Supplementary Materials). These 
landmarks determine "discrete" Procrustes distances for ev- 
ery two surfaces (see § 2), here called Observer-Determined 
Landmarks Procrustes (ODLP) distances. For each of the 
three distances we obtain thus a (symmetric) matrix. 

Comparing the distance (dissimilarity) matrices. We compare 
cWn- and cP- with ODLP-matrices in two different ways. Sets 
of distances are far from independent, and it is traditional to 
assess the relationship between distance matrices by a Man- 
tel correlation analysis [43] : first correlate the entries in the 
two square arrays, and then compute the fraction, among all 
possible relabelings of the rows/columns for one of them, that 
leads to a larger correlation coefficient; this Mantel stgnifi- 
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0.166 


not applicable 



Table 1. Results of Mantel correlation analysis for cP and cWN versus ODLP distances. 



cance is a stronger indicator than the correlation coefficient 
itself. Table 1 gives the results for our datasets. 
In all cases the Mantel significance between ODLP and cP 
distances is higher than that between ODLP and cWn. This 
indicates that distances computed using cP match those 
determined by morphometricans' better than those using 
cWn. 

Figure 1 illustrates the relationship of cP, cWn, and ODLP 
distances in a different way. In each of the two square matri- 
ces (corresponding to cP and cWn, each vs. ODLP), the color 
of each pixel indicates the value of the entry (using a red-blue 
colormap, with deep blue representing 0, and saturated red 
the largest value); upper right triangular halves correspond to 
cP or cWn; (identical) lower left halves to ODLP. The same 
ordering of samples is used in the three cases, with samples or- 
dered so that nearby samples typically have smaller distances. 
This type of display is especially good to compare the struc- 
ture of two distance matrices for small distances, often the 
most reliable]^] Note the better symmetry along the diagonal 
for ODLP/cP comparison on the left: in this comparison, as 
in the previous one, cP outperforms cWn. 




Fig. 1. For small distances, the structures of the matrices with cP-, cWn-distances 
and distances based on Observer Landmarks (ODLP) are very similar, with cP (on 
the right) the most similar to ODLP. The dataset illustrated here is dataset (A). 



Comparing scores in taxonomic classification. Accurately 
placed ODL usually result in smaller ODLP distances between 
specimens representing individuals of the same species/genus 
than between individuals of different species/genera. 
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Genera 


24 
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91.9 


68 


13 


76.3 


79.9 


88.1 


50.8 
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84.4 


77.8 


68.9 


Family 


17 


92.5 


94.3 


75.1 
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83.6 


91.8 


93.4 


68.9 


not applicable 


Above family 
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94.8 


95.7 


83.3 
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100 


100 


100 


98.4 


not applicable 



To assess whether this holds as well for the algorithmic cP and 
cWn distances, we run three taxonomic classification analyses 
on each data set, one using ODLP distances, and two using 



It can be viewed as the continuous analog to the exponentially large group of permutations. 
8 In some modern data analysis methods, such as diffusion-based or graph- La placian based methods, 
only the small distances are retained, to be used in spectral methods that "knit" the larger scale 
distances of the dataset together more reliably. 

^We do not claim this is a new/alternative method for automatic species or genus identification. 
Reliable automatic species recognition uses, In addition, auditory, chemical, non-geometric mor- 
phological and other data, analyzed by a range of methods; see e.g. 47 51 49 and references. 
Comparison of taxonomic classification based on human-expert-generated vs. algorithm-computed- 
distances is meant only as a quantitative evaluation based on biology rather than mathematics. 



cP and cWn distances^] with a "leave one out" procedure: 
each specimen (treated as unknown) is assigned to the taxo- 
nomic group of its nearest neighbor among the remainder of 
the specimens in the data set (treated as known). Table 2 lists 
success rates (in %) for three different classification queries 
for the three datasets. For each dataset N is the number of 
objects; for each query # is the number of groups. Classifi- 
cations based on the cP distances are similar in accuracy to 
those based on the ODLP distances, outperforming the cWn 
distances for all three of our anatomic datasets. 
Note: a similar classification based on topographic variables 
is less accurate: for the 99 teeth belonging to 24 genera, only 
54 (54 %) were classified correctly with a classification based 
on the four topographic variables Energy, Shearing Quotient, 
Relief Index, OPC. (Details in Supplementary Materials.) 

Comparing the correspondence maps. Morphometric analy- 
ses are based on the identification of corresponding landmark 
points on each of S and S'; the cP algorithm constructs a 
correspondence map a from S to S' . (The correspondence in- 
duced by cWn is less smooth and will not be considered here.) 
For each landmark point L on S, we can compare the location 
on S' of its images a(L) with the location of the corresponding 
landmark points L' . Fig. 2 shows that the "propagated" land- 
marks a(L) typically turn out to be very close to those of the 
"true" landmarks L' . (More in Supplementary Materials.) 
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Fig. 2. Observer-placed landmarks can be propagated from structure M (A) using 
cP determined correspondence maps (B) to another specimen N (C). The similar- 
ity between propagated landmarks in (C) and observer placed landmarks in (D) on 
N shows the success of the method, and makes explicit the geometric basis for the 
observer-determinations. 



An application. These comparisons show our algorithms cap- 
ture biologically informative shape variation. But scientists 
are interested in more than overall shape! We illustrate how 
correspondence maps could be used to analyze more spe- 
cific features. In comparative morphological and phylogenetic 
studies, anatomical identification of certain features (e.g., par- 
ticular cusps on teeth) is controversial in some cases; an ex- 
ample of this is the distolingual corner of sportive lemur (Lep- 
ilemur) lower molars in our data set (A) [371 136] . 

In such controversial cases, transformational homology 
|17| hypotheses are usually supported by a specific compara- 
tive sample or inferred morphocline 35, 52, 36 . Lepilemur is 
thought by some researchers to lack a cusp known as an en- 
toconid (Fig. 3) but to have a hypertrophied metastylid cusp 
that "takes the place" of the entoconid [36] in other taxa. Yet, 
in comparing a Lepilemur tooth to a more "standard" primate 
tooth, like that of Microcebus, both seem to have the same 
basic cusps; alternatives to the viewpoint of |36 have there- 
fore also been argued in the literature [37]. However, another 
lemur, Megaladapis (now extinct), arguably a closer relative 
of Lepilemur than Microcebus, has an entoconid that is very 
small and a metastylid that is rather large, thus providing 
an evolutionary argument supporting the original hypothesis. 
(For more details, see Supplementary Materials.) Such argu- 
ments can now be made more precise. We can propagate (as 
in Fig. 2) landmarks from the Microcebus to the Lepilemur 
molar; this direct propagation matches the entoconid cusp 
of Microcebus with the controversial cusp of Lepilemur (Fig. 
3, path 1), supporting |37) . In contrast, when we propagate 
landmarks in different steps, either from Microcebus to Megal- 
adapis and then to Lepilemur (Fig. 3, path 2), or through 
the extinct Adapis and extant Lemur (Fig. 3, path 3), the 
Lepilemur metastylid takes the place of the Microcebus ento- 
conid, supporting [36]. Automatic propagation of landmarks 
via mathematical algorithms recenters the controversy on the 
(different) discussion of which propagation channel is most 
suitable. 
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Fig. 3. Observer-placed landmarks on a tooth of Microcebus are propagated us- 
ing cP-determined correspondence maps to a tooth of Lepilemur. Path 1 is direct, 
Path 2 and 3 have intermediate steps, representing step-wise propagation between 
teeth of other taxa. 



Summary and Conclusion. New distances between 2-D sur- 
faces, with fast numerical implementations, were shown to 
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lead to fast, landmark-free algorithms that map anatomi- 
cal surfaces automatically to other instances of anatomically 
equivalent surfaces, in a way that mimics accurately the de- 
tailed feature-point correspondences recognized qualitatively 
by scientists, and that preserves information on taxonomic 
structure as well as observer-determined-landmark distances. 
Moreover, the correspondence maps thus generated can incor- 
porate, in their tracking of point features, evolutionary rela- 
tionships inferred to link different taxa together. 

Our approach makes morphology accessible to non- 
specialists and allows the documentation of anatomical varia- 
tion and quantitative traits with previously unmatched com- 
prehensiveness and objectivity. More frequent, rapid, ob- 
jective, and comprehensive construction of morphological 



datasets will allow the study of morphological diversity's evo- 
lutionary significance to be better synchronized with studies 
incorporating genetic and developmental information, leading 
to a better understanding of anatomical form and its genetic 
basis, as well as the evolutionary processes that have con- 
tributed to their diversity among living things on earth. 
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